Purpose: The aim of this study was to demonstrate the utility of unsupervised domain adaptation (UDA) in automated knee osteoarthritis (OA) phenotype classification using a small dataset (n=50). Materials and Methods: For this retrospective study, we collected 3,166 three-dimensional (3D) double-echo steady-state magnetic resonance (MR) images from the Osteoarthritis Initiative dataset and 50 3D turbo/fast spin-echo MR images from our institute (in 2020 and 2021) as the source and target datasets, respectively. For each patient, the degree of knee OA was initially graded according to the MRI Osteoarthritis Knee Score (MOAKS) before being converted to binary OA phenotype labels. The proposed UDA pipeline included (a) pre-processing, which involved automatic segmentation and region-of-interest cropping; (b) source classifier training, which involved pre-training phenotype classifiers on the source dataset; (c) target encoder adaptation, which involved unsupervised adaption of the source encoder to the target encoder and (d) target classifier validation, which involved statistical analysis of the target classification performance evaluated by the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity and accuracy. Additionally, a classifier was trained without UDA for comparison. Results: The target classifier trained with UDA achieved improved AUROC, sensitivity, specificity and accuracy for both knee OA phenotypes compared with the classifier trained without UDA. Conclusion: The proposed UDA approach improves the performance of automated knee OA phenotype classification for small target datasets by utilising a large, high-quality source dataset for training. The results successfully demonstrated the advantages of the UDA approach in classification on small datasets.
translated by 谷歌翻译
最近,由于受监督人员重新识别(REID)的表现不佳,域名概括(DG)人REID引起了很多关注,旨在学习一个不敏感的模型,并可以抵抗域的影响偏见。在本文中,我们首先通过实验验证样式因素是域偏差的重要组成部分。基于这个结论,我们提出了一种样式变量且无关紧要的学习方法(SVIL)方法,以消除样式因素对模型的影响。具体来说,我们在SVIL中设计了样式的抖动模块(SJM)。 SJM模块可以丰富特定源域的样式多样性,并减少各种源域的样式差异。这导致该模型重点关注与身份相关的信息,并对样式变化不敏感。此外,我们将SJM模块与元学习算法有机结合,从而最大程度地提高了好处并进一步提高模型的概括能力。请注意,我们的SJM模块是插件和推理,无需成本。广泛的实验证实了我们的SVIL的有效性,而我们的方法的表现优于DG-REID基准测试的最先进方法。
translated by 谷歌翻译
现有的自动驾驶管道将感知模块与预测模块分开。这两个模块通过手工挑选的功能(例如代理框和轨迹)作为接口进行通信。由于这种分离,预测模块仅从感知模块接收部分信息。更糟糕的是,感知模块的错误会传播和积累,从而对预测结果产生不利影响。在这项工作中,我们提出了VIP3D,这是一种视觉轨迹预测管道,利用原始视频的丰富信息来预测场景中代理的未来轨迹。VIP3D在整个管道中采用稀疏的代理查询,使其完全可区分和可解释。此外,我们为这项新型的端到端视觉轨迹预测任务提出了评估度量。Nuscenes数据集的广泛实验结果表明,VIP3D在传统管道和以前的端到端模型上的强劲性能。
translated by 谷歌翻译
稀疏的张量正在迅速成为现代深度学习工作负载的关键组成部分。但是,开发高性能的稀疏运营商可能很困难和乏味,现有的供应商库无法满足新运营商的不断升级要求。稀疏张量编译器简化了操作员的开发,但是对深度学习的有效稀疏编译仍然具有挑战性,因为单个稀疏格式无法最大程度地提高硬件效率,并且单次弹出编译器无法跟上最新的硬件和系统进步。我们表明,解决这两个挑战的关键是两种合成性。在本文中,我们提出了SparSetir,这是一种稀疏的张张汇编抽象,可为深度学习工作负载提供可合理的格式和可组合的转换。 Sparsetir在这些可组合组件上构建一个搜索空间,以进行性能调整。通过这些改进,SparSetir获得了单个操作员的GPU上的一致性能加速与供应商库:GNN操作员的1.1-3.3倍,稀疏变压器操作员的1.1-4.4x。 Sparsetir还以1.1-2.2倍的速度加速了端到端GNN,用于图形训练,而RGCN推断为0.9-26x。
translated by 谷歌翻译
在各种设备上部署深度学习模型已成为一个重要的话题。硬件专业化的浪潮为多维张量计算带来了一套多样化的加速度原始图。这些新的加速原始基原料以及新兴的机器学习模型带来了巨大的工程挑战。在本文中,我们提出了Tensorir,这是一种编译器抽象,用于通过这些张量计算原始素优化程序。Tensorir概括了现有机器学习编译器中使用的循环巢表示,以将张量计算作为一流的公民。最后,我们在抽象之上构建了一个端到端框架,以自动优化给定的张量计算原始图的深度学习模型。实验结果表明,Tensorir编译会自动使用给定硬件后端的张量计算原始图,并提供与跨平台的最新手工精制系统竞争性能的性能。
translated by 谷歌翻译
基于自然语言(NL)的车辆检索旨在搜索给定文本描述的特定车辆。不同于基于图像的车辆检索,基于NL的车辆检索不仅需要考虑车辆外观,还需要考虑周围环境和时间关系。在本文中,我们提出了一个具有空间关系建模(SSM)方法的对称网络,用于基于NL的车辆检索。具体而言,我们设计了一个对称网络,以学习文本描述和车辆图像之间的统一跨模式表示,其中保留了车辆外观细节和车辆轨迹全球信息。此外,为了更好地利用位置信息,我们提出了一种空间关系建模方法,以考虑周围环境和相互关系的考虑。定性和定量实验验证了所提出的方法的有效性。我们在第六届AI城市挑战赛的测试集上获得了43.92%的MRR准确性,该挑战是基于自然语言的车辆检索轨道,在公共排行榜上所有有​​效的提交中排名第一。该代码可从https://github.com/hbchen121/aicity2022_track2_ssm获得。
translated by 谷歌翻译
This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS), with high performance, low cost, and in-depth interpretation. NAS has been explosively studied to automate the discovery of top-performer neural networks, but suffers from heavy resource consumption and often incurs search bias due to truncated training or approximations. Recent NAS works start to explore indicators that can predict a network's performance without training. However, they either leveraged limited properties of deep networks, or the benefits of their training-free indicators are not applied to more extensive search methods. By rigorous correlation analysis, we present a unified framework to understand and accelerate NAS, by disentangling "TEG" characteristics of searched networks - Trainability, Expressivity, Generalization - all assessed in a training-free manner. The TEG indicators could be scaled up and integrated with various NAS search methods, including both supernet and single-path approaches. Extensive studies validate the effective and efficient guidance from our TEG-NAS framework, leading to both improved search accuracy and over 56% reduction in search time cost. Moreover, we visualize search trajectories on three landscapes of "TEG" characteristics, observing that while a good local minimum is easier to find on NAS-Bench-201 given its simple topology, balancing "TEG" characteristics is much harder on the DARTS search space due to its complex landscape geometry. Our code is available at https://github.com/VITA-Group/TEGNAS.
translated by 谷歌翻译
由于人类行为的瞬极性,预测道路代理的未来轨迹是对自动驾驶的挑战。最近,证明基于目标的多轨道预测方法是有效的,在那里他们首先将过度采样的目标候选者进行得分,然后从它们中选择最终集合。然而,这些方法通常涉及基于稀疏预定锚和启发式目标选择算法的目标预测。在这项工作中,我们提出了一种名为Densetnt的无锚和端到端轨迹预测模型,它直接从密集的目标候选者输出一组轨迹。此外,我们介绍了基于离线优化的技术,为我们的最终在线模型提供多重伪标签。实验表明,Densetnt实现了最先进的性能,在协会运动预测基准中排名第一,并成为2021 Waymo开放数据集运动预测挑战的第一名获胜者。
translated by 谷歌翻译
神经结构搜索(NAS)经常列车并评估大量架构。最近的基于预测的NAS方法尝试通过两个关键步骤来缓解这些重的计算成本:采样一些架构性能对并拟合代理精度预测器。然而,由于难以拟合庞大的搜索空间,所以这些预测变量远非准确地定位顶级架构。本文反映了一个简单而重要的问题:如果我们的最终目标是找到最好的建筑,我们真的需要融洽整个空间吗?我们提出了一种使用一个强预测器来拟合整个建筑空间的范式转变,以通过一组较弱的预测器逐渐地拟合朝向高性能子空间的搜索路径。作为弱预测因子的关键属性,它们的采样更好的架构的概率不断增加。因此,我们只示出了一些由以前学识到的预测器引导的少数好的架构,并估计一个新的更好的弱预测因素。这种令人尴尬的骨骼框架被称为缺点,产生粗略迭代,逐渐改进采样空间的排名。广泛的实验表明,在NAS-Bench-101和NAS-Bench-201上找到顶部性能架构的样本较少的缺点。与最先进的(SOTA)基于预测的NAS方法相比,缺点始终具有显着的边距,例如,需要至少7.5倍的样品来查找在NAS-Bench-101上的全局最优。缺点也可以吸收他们的想法,以提高性能。此外,缺点袭击了Imagenet MobileNet搜索空间中的81.3%的新SOTA结果。代码可在https://github.com/vita-group/weaknas获得。
translated by 谷歌翻译
Self-supervised pre-training recently demonstrates success on large-scale multimodal data, and state-of-the-art contrastive learning methods often enforce the feature consistency from cross-modality inputs, such as video/audio or video/text pairs. Despite its convenience to formulate and leverage in practice, such cross-modality alignment (CMA) is only a weak and noisy supervision, since two modalities can be semantically misaligned even they are temporally aligned. For example, even in the commonly adopted instructional videos, a speaker can sometimes refer to something that is not visually present in the current frame; and the semantic misalignment would only be more unpredictable for the raw videos from the internet. We conjecture that might cause conflicts and biases among modalities, and may hence prohibit CMA from scaling up to training with larger and more heterogeneous data. This paper first verifies our conjecture by observing that, even in the latest VATT pre-training using only instructional videos, there exist strong gradient conflicts between different CMA losses within the same video, audio, text triplet, indicating them as the noisy source of supervision. We then propose to harmonize such gradients, via two techniques: (i) cross-modality gradient realignment: modifying different CMA loss gradients for each sample triplet, so that their gradient directions are more aligned; and (ii) gradient-based curriculum learning: leveraging the gradient conflict information on an indicator of sample noisiness, to develop a curriculum learning strategy to prioritize training on less noisy sample triplets. Applying those techniques to pre-training VATT on the HowTo100M dataset, we consistently improve its performance on different downstream tasks. Moreover, we are able to scale VATT pre-training to more complicated non-narrative Youtube8M dataset to further improve the state-of-the-arts.
translated by 谷歌翻译